Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
نویسندگان
چکیده
In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary and unknown probability distribution, we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, Θ̃( √ T ), Θ(T ), or Θ(T ). We provide a computationally efficient learning algorithm that achieves the minimax regret within logarithmic factor for any game.
منابع مشابه
Toward a Classification of Finite Partial-Monitoring Games
In a finite partial-monitoring game against Nature, the Learner repeatedly chooses one of finitely many actions, the Nature responds with one of finitely many outcomes, the Learner suffers a loss and receives feedback signal, both of which are fixed functions of the action and the outcome. The goal of the Learner is to minimize its total cumulative loss. We make progress towards classification ...
متن کاملAn adaptive algorithm for finite stochastic partial monitoring
We present a new anytime algorithm that achieves near-optimal regret for any instance of finite stochastic partial monitoring. In particular, the new algorithm achieves the minimax regret, within logarithmic factors, for both “easy” and “hard” problems. For easy problems, it additionally achieves logarithmic individual regret. Most importantly, the algorithm is adaptive in the sense that if the...
متن کاملNo Internal Regret via Neighborhood Watch
We present an algorithm which attains O( p T ) internal (and thus external) regret for finite games with partial monitoring under the local observability condition. Recently, this condition has been shown by Bartók, Pál, and Szepesvári [4] to imply the O( p T ) rate for partial monitoring games against an i.i.d. opponent, and the authors conjectured that the same holds for non-stochastic advers...
متن کاملNon-trivial two-armed partial-monitoring games are bandits
We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is Θ( √ T ).
متن کاملRegret Bounds and Minimax Policies under Partial Monitoring
This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudoregret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function ψ for which we propose a u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011